mm-Jtl 


PV 


19980721  032 


On  computing  global  similarity  in  images* 

S.  Ravela  and  R.  Manmatha 
Computer  Science  Department 
University  of  Massachusetts,  Amherst,  MA  01003 
Email:  {ravela, manmatha} ©cs.umass.edu 


Abstract 

The  retrieval  of  images  based  on  their  visual  similarity 
to  an  example  image  is  an  important  and  fascinating  area  of 
research.  Here,  a  method  to  characterize  visual  appearance 
for  determining  global  similarity  in  images  is  described. 

Images  are  filtered  with  Gaussian  derivatives  and  ge¬ 
ometric  features  are  computed  from  the  filtered  images. 
The  geometric  features  used  here  are  curvature  and  phase. 
Two  images  may  be  said  to  be  similar  if  they  have  simi¬ 
lar  distributions  of  such  features.  Global  similarity  may, 
therefore,  be  deduced  by  comparing  histograms  of  these 
features.  This  allows  for  rapid  retrieval  and  examples  from 
collection  of  gray-level  and  trademark  images  are  shown. 

1  Introduction 

The  advent  of  large  multi-media  collections  and  digi¬ 
tal  libraries  has  led  to  a  need  for  good  search  tools  to  in¬ 
dex  and  retrieve  information  from  them.  For  text  avail¬ 
able  in  machine  readable  form  (ASCII)  a  number  of  good 
search  engines  are  available.  However,'  there  are  as  yet  no 
good  tools  to  retrieve  images.  The  traditional  approach  to 
searching  and  indexing  images  using  manual  annotations 
is  slow,  labor  intensive  and  expensive.  In  addition,  textual 
annotations  cannot  encode  all  the  information  available  in 
an  image.  There  is  thus  a  need  for  retrieving  images  us¬ 
ing  their  content.  The  indexing  and  retrieval  of  images 
using  their  content  is  a  difficult  problem.  A  person  using 
an  image  retrieval  system  usually  seeks  to  find  semanti¬ 
cally  relevant  information.  This  entails  solutions  to  such 
hard  problems  as  automatic  segmentation,  robust  feature 
detection  and  recognition,  all  of  which  are  as  yet  unsolved. 
However,  many  image  attributes  like  color,  texture,  shape 
and  “appearance”  are  often  directly  correlated  with  the  se¬ 
mantics  of  the  problem.  For  example,  logos  or  product 
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packages  (e.g.,  a  box  of  Tide)  have  the  same  color  wher¬ 
ever  they  are  found.  The  coat  of  a  leopard  has  a  unique 
texture  while  Abraham  Lincoln’s  appearance  is  uniquely 
defined.  These  image  attributes  can  often  be  used  to  index 
and  retrieve  images. 

A  common  model  for  retrieval,  and  one  that  is  adopted 
here,  is  that  images  in  the  database  are  processed  and  de¬ 
scribed  by  a  set  of  feature  vectors.  A  priori  these  vectors 
are  indexed.  During  run-time,  a  query  is  provided  in  the 
form  of  an  example  image  and  its  features  are  compared 
with  those  stored.  Images  are  then  retrieved  in  the  or¬ 
der  indicated  by  the  comparison  operator.  In  this  paper, 
objects  similar  in  visual  appearance  to  a  given  query  ob¬ 
ject  are  retrieved  by  comparing  with  a  set  of  database  im¬ 
ages  using  a  characterization  of  their  image  intensity  sur¬ 
faces.  Arguably  an  object’s  visual  appearance  in  an  im¬ 
age  is  closely  related  to  several  factors  including,  among 
others,  its  three  dimensional  shape,  albedo,  surface  texture 
and  the  imaged  viewpoint.  It  is  non-trivial  to  separate  the 
different  factors  constituting  an  object’s  appearance.  For 
example,  the  face  of  a  person  has  a  unique  appearance  that 
cannot  just  be  characterized  by  the  geometric  shape  of  the 
’component  parts’.  In  this  paper  a  characterization  of  the 
shape  of  the  intensity  surface  of  imaged  objects  is  used  for 
retrieval.  The  experiments  conducted  show  that  retrieved 
objects  have  similar  visual  appearance,  and  henceforth  an 
association  is  made  between  ’appearance’  and  the  shape  of 
the  intensity  surface. 

Specifically,  this  paper  focuses  on  a  representation  for 
computing  global  similarity.  That  is,  the  task  is  to  find 
images  that,  as  a  whole,  appear  visually  similar.  The  util¬ 
ity  of  global  similarity  retrieval  is  evident,  for  example,  in 
finding  similar  scenes  or  similar  faces  in  a  face  database. 
In  addition,  practical  applications  such  as  finding  similar 
trademarks  in  a  trademark  database  significantly  benefit 
from  global  similarity  retrieval. 

The  image  intensity  surface  is  robustly  characterized 
using  features  obtained  from  responses  to  multi-scale 
Gaussian  derivative  filters.  Koenderink  [8]  and  others  [3] 
have  argued  that  the  local  structure  of  an  image  can  be 
represented  by  the  outputs  of  a  set  of  Gaussian  derivative 
filters  applied  to  an  image.  That  is,  images  are  filtered 
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with  Gaussian  derivatives  at  several  scales  and  the  result¬ 
ing  response  vector  locally  describes  the  structure  of  the 
intensity  surface.  By  computing  features  derived  from  the 
local  response  vector  and  accumulating  them  over  the  im¬ 
age,  robust  representations  appropriate  to  querying  images 
as  a  whole  (global  similarity)  can  be  generated.  One  such 
representation  uses  histograms  of  features  derived  from 
the  multi-scale  Gaussian  derivatives.  Histograms  form  a 
global  representation  because  they  capture  the  distribution 
of  local  features  (A  histogram  is  one  of  the  simplest  ways 
of  estimating  a  non  parametric  distribution).  This  global 
representation  can  be  efficiently  used  for  global  similarity 
retrieval  by  appearance  and  retrieval  is  very  fast. 

The  choice  of  features  often  determines  how  well  the 
image  retrieval  system  performs.  Here  the  task  is  to  ro¬ 
bustly  characterize  the  3  dimensional  intensity  surface.  A 
3-dimensional  surface  is  uniquely  determined  if  the  local 
curvatures  everywhere  are  known.  Thus,  it  is  appropriate 
that  one  of  the  features  be  local  curvature.  The  principal 
curvatures  of  the  intensity  surface  are  invariant  to  image 
plane  rotations,  monotonic  intensity  variations  and  further, 
their  ratios  are  in  principle  insensitive  to  scale  variations  of 
the  entire  image.  However,  spatial  orientation  information 
is  lost  when  constructing  histograms  of  curvature  (or  ratios 
thereof)  alone.  Therefore  we  augment  the  local  curvature 
with  local  phase,  and  the  representation  uses  histograms  of 
local  curvature  and  phase. 

Local  principal  curvatures  and  phase  are  computed 
at  several  scales  from  responses  to  multi-scale  Gaussian 
derivative  filters.  Then  histograms  of  the  curvature  ra¬ 
tios  [7,  1]  and  phase  are  generated.  Thus,  the  image  is 
represented  by  a  single  vector  (multi-scale  histograms). 
During  run-time  the  user  presents  an  example  image  as 
a  query  and  the  query  histograms  are  compared  with  the 
ones  stored,  and  the  images  are  then  ranked  and  displayed 
in  order  to  the  user. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 
surveys  related  work  in  the  literature.  In  section  3,  the  no¬ 
tion  of  appearance  is  developed  further  and  characterized 
using  Gaussian  derivative  filters  and  the  derived  global 
representation  is  discussed.  Comparisons  are  made  in  the 
context  of  trademark  retrieval  with  the  traditional  moment 
invariants,  A  discussion  and  conclusion  follows  in  Sec¬ 
tion  4. 

2  RELATED  WORK 

Several  authors  have  tried  to  characterize  the  appear¬ 
ance  of  an  object  via  a  description  of  the  intensity  surface. 
In  the  context  of  object  recognition  [14]  represent  the  ap¬ 
pearance  of  an  object  using  a  parametric  eigen  space  de¬ 
scription.  This  space  is  constructed  by  treating  the  image 
as  a  fixed  length  vector,  and  then  computing  the  principal 
components  across  the  entire  database.  The  images  there¬ 
fore  have  to  be  size  and  intensity  normalized,  segmented 


and  trained.  Similarly,  using  principal  component  repre¬ 
sentations  described  in  [5]  face  recognition  is  performed 
in  [19].  In  [17]  the  traditional  eigen  representation  is  aug¬ 
mented  by  using  most  discriminant  features  and  is  applied 
to  image  retrieval.  The  authors  apply  eigen  representation 
to  retrieval  of  several  classes  of  objects.  The  issue,  how¬ 
ever  ,  is  that  these  classes  are  manually  determined  and 
training  must  be  performed  on  each.  The  approach  pre¬ 
sented  in  this  paper  is  different  from  all  the  above  because 
eigen  decompositions  are  not  used  at  all  to  characterize 
appearance.  Further,  the  method  presented  uses  no  learn¬ 
ing  and,  does  not  require  constant  sized  images.  It  should 
be  noted  that  although  learning  significantly  helps  in  such 
applications  as  face  recognition,  however,  it  may  not  be 
feasible  in  many  instances  where  sufficient  examples  are 
not  available.  This  system  is  designed  to  be  applied  to  a 
wide  class  of  images  and  there  is  no  restriction  per  se. 

In  earlier  work  we  showed  that  local  features  computed 
using  Gaussian  derivative  filters  can  be  used  for  local  sim¬ 
ilarity,  i.e.  to  retrieve  parts  of  images  [12].  Here  we  argue 
that  global  similarity  can  be  determined  by  computing  lo¬ 
cal  features  and  comparing  distributions  of  these  features. 
This  technique  gives  good  results,  and  is  reasonably  toler¬ 
ant  to  view  variations.  Schiele  and  Crowley  [16]  used  such 
a  technique  for  recognizing  objects  using  grey-level  im¬ 
ages.  Their  technique  used  the  outputs  of  Gaussian  deriva¬ 
tives  as  local  features.  A  multi-dimensional  histogram  of 
these  local  features  is  then  computed.  Two  images  are  con¬ 
sidered  to  be  of  the  same  object  if  they  had  similar  his¬ 
tograms.  The  difference  between  this  approach  and  the 
one  presented  by  Schiele  and  Crowley  is  that  here  we  use 
ID  histograms  (as  opposed  to  multi-dimensional)  and  fur¬ 
ther  use  the  principal  curvatures  as  the  primary  feature. 

The  use  of  Gaussian  derivative  filters  to  represent  ap¬ 
pearance  is  motivated  by  their  use  in  describing  the  spatial 
structure  [8]  and  its  uniqueness  in  representing  the  scale 
space  of  a  function  [9,  6,  21,  18]  The  invariance  properties 
of  the  principal  curvatures  are  well  documented  in  [3]. 

In  the  context  of  global  similarity  retrieval  it  should  be 
noted  that  representations  using  moment  invariants  have 
been  well  studied  [13].  In  these  methods  global  representa¬ 
tion  of  appearance  may  involve  computing  a  few  numbers 
over  the  entire  image.  Two  images  are  then  considered 
similar  if  these  numbers  are  close  to  each  other  (say  using 
an  L2  norm).  We  argue  that  such  representations  are  not 
able  to  really  capture  the  “appearance”  of  an  image,  par¬ 
ticularly  in  the  context  of  trademark  retrieval  where  mo¬ 
ment  invariants  are  widely  used.  In  other  work  [12]  we 
compared  moment  invariants  with  the  technique  presented 
here  and  found  that  moment  invariants  work  best  for  a  sin¬ 
gle  binary  shape  without  holes  in  it,  and,  in  general,  fare 
worse  than  the  method  presented  here. 

Texture  based  image  retrieval  is  also  related  to  the  ap¬ 
pearance  based  work  presented  in  this  paper.  Using  Wold 


modeling,  in  [10]  the  authors  try  to  classify  the  entire  Bro- 
datz  texture  and  in  [4]  attempt  to  classify  scenes,  such  as 
city  and  country.  Of  particular  interest  is  work  by  [  1 1]  who 
use  Gabor  filters  to  retrieve  texture  similar  images. 

The  earliest  general  image  retrieval  systems  were  de¬ 
signed  by  [2,  15].  In  [2]  the  shape  queries  require  prior 
manual  segmentation  of  the  database  which  is  undesirable 
and  not  practical  for  most  applications. 


Where  Ix  (p,  cr)  and  ly  (p,cr)  are  the  local  derivatives  of 
Image  I  around  point  p  using  Gaussian  derivative  at  scale 
cr.  Similarly  Ixx  *)»  •)»  ^tid  lyy  (-,  •)  are  the  corre¬ 

sponding  second  derivatives.  The  normal  curvature  N  and 
tangential  curvature  T  are  then  combined  [7]  to  generate  a 
shape  index  as  follows: 
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3  Global  representation  of  appearance 

Three  steps  are  involved  in  order  to  computing  global 
similarity.  First,  local  derivatives  are  computed  at  several 
scales.  Second,  derivative  responses  are  combined  to  gen¬ 
erate  local  features,  namely,  the  principal  curvatures  and 
phase  and,  their  histograms  are  generated.  Third,  the  ID 
curvature  and  phase  histograms  generated  at  several  scales 
are  matched.  These  steps  are  described  next. 

A.  Computing  local  derivatives:  Computing  deriva¬ 
tives  using  finite  differences  does  not  guarantee  stability 
of  derivatives.  In  order  to  compute  derivatives  stably,  the 
image  must  be  regularized,  or  smoothed  or  band-limited. 
A  Gaussian  filtered  image  =  I  *  G  obtained  by  con¬ 
volving  the  image  I  with  a  normalized  Gaussian  G(r,cr) 
is  a  band-limited  function.  Its  high  frequency  components 
are  eliminated  and  derivatives  will  be  stable.  In  fact,  it  has 
been  argued  by  Koenderink  and  van  Doom  [8]  and  others 
[3]  that  the  local  structure  of  an  image  I  at  a  given  scale 
can  be  represented  by  filtering  it  with  Gaussian  derivative 
filters  (in  the  sense  of  a  Taylor  expansion),  and  they  term 
it  the-N-jet. 

However,  the  shape  of  the  smoothed  intensity  surface 
depends  on  the  scale  at  which  it  is  observed.  For  exam¬ 
ple,  at  a  small  scale  the  texture  of  an  ape’s  coat  will  be 
visible.  At  a  large  enough  scale,  the  ape’s  coat  will  appear 
homogeneous.  A  description  at  just  one  scale  is  likely  to 
give  rise  to  many  accidental  mis-matches.  Thus  it  is  desir¬ 
able  to  provide  a  description  of  the  image  over  a  number 
of  scales,  that  is,  a  scale  space  description  of  the  image.  It 
has  been  shown  by  several  authors  [9, 6, 21, 18, 3],  that  un¬ 
der  certain  general  constraints,  the  Gaussian  filter  forms  a 
unique  choice  for  generating  scale-space.  Thus  local  spa¬ 
tial  derivatives  are  computed  at  several  scales. 


B.  Feature  Histograms:  The  normal  and  tangential  cur¬ 
vatures  of  a  3-D  surface  (X,Y, Intensity)  are  defined  as  [3]: 
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The  index  value  C  is  f  when  AT  =  T  and  is  undefined 
when  either  N  and  T  are  both  zero,  and  is,  therefore,  not 
computed.  This  is  interesting  because  very  flat  portions  of 
an  image  (or  ones  with  constant  ramp)  are  eliminated.  For 
example  in  Figure  2(middle-row),  the  background  in  most 
of  these  face  images  does  not  contribute  to  the  curvature 
histogram.  The  curvature  index  or  shape  index  is  rescaled 
and  shifted  to  the  range  [0, 1]  as  is  done  in  [1].  A  histogram 
is  then  computed  of  the  valid  index  values  over  an  entire 
image. 

The  second  feature  used  is  phase.  The  phase  is  simply 
defined  as  P  (p,  a)  =  atan2  {ly  (p,  cr) ,  Ix  (p,  cr)).  Note 
that  P  is  defined  only  at  those  locations  where  C  is  and  ig¬ 
nored  elsewhere.  As  with  the  curvature  index  P  is  rescaled 
and  shifted  to  lie  between  the  interval  [0, 1]. 

Although  the  curvature  and  phase  histograms  are  in 
principle  insensitive  to  variations  in  scale,  in  early  ex¬ 
periments  we  found  that  computing  histograms  at  mul¬ 
tiple  scales  dramatically  improved  the  results.  An  ex¬ 
planation  for  this  is  that  at  different  scales  different  lo¬ 
cal  structures  are  observed  and,  therefore,  multi-scale  his¬ 
tograms  are  a  more  robust  representation.  Consequently, 
a  feature  vector  is  defined  for  an  image  J  as  the  vector 
Vi  =  where 

Hp  and  He  are  the  curvature  and  phase  histograms  respec¬ 
tively.  We  found  that  using  5  scales  gives  good  results  and 
the  scales  are  1  •  ■  •  4  in  steps  of  half  an  octave. 

C.  Matching  feature  histograms:  Two  feature  vectors 
are  compared  using  normalized  cross-covariance  defined 
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where  =  VJ  —  mean{Vi). 

Retrieval  is  carried  out  as  follows.  A  query  image  is  se¬ 
lected  and  the  query  histogram  vector  Vq  is  correlated  with 
the  database  histogram  vectors  Vi  using  the  above  formula. 
Then  the  images  are  ranked  by  their  correlation  score  and 
displayed  to  the  user.  In  this  implementation,  and  for  eval¬ 
uation  purposes,  the  ranks  are  computed  in  advance,  since 
every  query  image  is  also  a  database  image. 

3.1  Experiments 

The  curvature-phase  method  is  tested  using  two 
databases.  The  first  is  a  trademark  database  of  2048  im- 


Figure  1:  Trademark  retrieval  using  Curvature  and  Phase 


Figure  2:  Image  retrieval  using  Curvature  and  Phase 


(also  had  Coca  Cola  logos)  were  retrieved  (100%  pre¬ 
cision),  two  other  very  dissimilar  images  with  coca- 
cola  logos  were  not. 

6.  Scenes  with  Bill  Clinton  (72.8%).  The  retrieval  in  this 
case  results  in  several  mismatches.  However,  three  of 


the  four  are  retrieved  in  succession  at  the  top  and  the 
scenes  appear  visually  similar. 

While  the  queries  presented  here  are  not  “optimal”  with 
respect  to  the  design  constraints  of  global  similarity  re¬ 
trieval,  they  are  however,  realistic  queries  that  can  be  posed 
to  the  system.  Mismatches  can  and  do  occur.  The  first 


ages  obtained  from  the  US  Patent  and  Trademark  Office 
(PTO).  The  images  obtained  from  the  PTO  are  large,  bi¬ 
nary  and  are  converted  to  gray-level  and  reduced  for  the 
experiments.  The  second  database  is  a  collection  of  1561 
assorted  gray-level  images.  This  database  has  digitized 
images  of  cars,  steam  locomotives,  diesel  locomotives, 
apes,  faces,  people  embedded  in  different  background(s) 
and  a  small  number  of  other  miscellaneous  objects  such 
as  houses.  These  images  were  obtained  from  the  Internet 
and  the  Corel  photo-cd  collection  and  were  taken  with  sev¬ 
eral  different  cameras  of  unknown  parameters,  and  under 
vaiying  uncontrolled  lighting  and  viewing  geometry. 

In  the  following  experiments  an  image  is  selected  and 
submitted  as  a  query.  The  objective  of  this  query  is  stated 
and  the  relevant  images  are  decided  in  advance.  Then  the 
retrieval  instances  are  gauged  against  the  stated  objective. 
In  general,  objectives  of  the  form  ’extract  images  similar 
in  appearance  to  the  query’  will  be  posed  to  the  retrieval 
algorithm.  A  measure  of  the  performance  of  the  retrieval 
engine  can  be  obtained  by  examining  the  recall/precision 
table  for  several  queries.  Briefly,  recall  is  the  proportion 
of  the  relevant  material  actually  retrieved  and  precision  is 
the  proportion  of  retrieved  material  that  is  relevant  [20]. 
It  is  a  standard  widely  used  in  the  information  retrieval 
community  and  is  one  that  is  adopted  here. 

Queries  were  submitted  each  to  the  trademark  and  as¬ 
sorted  image  collection  for  the  purpose  of  computing  re¬ 
call/precision.  The  judgment  of  relevance  is  qualitative. 
For  each  query  in  both  databases  the  relevant  images  were 
decided  in  advance.  These  were  restricted  to  48.  The  top 
48  ranks  were  then  examined  to  check  the  proportion  of  re¬ 
trieved  images  that  were  relevant.  All  images  not  retrieved 
within  48  were  assigned  a  rank  equal  to  the  size  of  the 
database.  That  is,  they  are  not  considered  retrieved.  These 
ranks  were  used  to  interpolate  and  extrapolate  precision  at 
all  recall  points.In  the  case  of  assorted  images  relevance  is 
easier  to  determine  and  more  similar  for  different  people. 
However  in  the  trademark  case  it  can  be  quite  difficult  and 
therefore  the  recall-precision  can  be  subject  to  some  error. 
The  recall/precision  results  are  summarized  in  Table  1  and 
both  databases  are  individually  discussed  below. 

Figure  1  shows  the  performance  of  the  algorithm  on  the 
trademark  images.  Each  strip  depicts  the  top  8  retrievals, 
given  the  leftmost  as  the  query.  Most  of  the  shapes  have 
roughly  the  same  structure  as  the  query.  Note  that,  out¬ 
line  and  solid  figures  are  treated  similarly  (see  rows  one 
and  two  in  Figure  1).  Six  queries  were  submitted  for  the 
purpose  of  computing  recall-precision  in  Table  1. 

Experiments  are  also  carried  out  with  assorted  gray 
level  images.  Six  queries  submitted  for  recall-precision 
are  shown  in  Figure  2.  The  left  most  image  in  each  row  is 
the  query  and  is  also  the  first  retrieved.  The  rest  from-left 
to  right  are  seven  retrievals  depicted  in  rank  order.  Note 
that,  flat  portions  of  the  background  are  never  considered 


because  the  principal  curvatures  are  very  close  to  zero  and 
therefore  do  not  contribute  to  the  final  score.  Thus,  for 
example,  the  flat  background  in  Figure  2(second  row)  is 
not  used.  Notice  that  visually  similar  images  are  retrieved 
even  when  there  is  some  change  in  the  background  (row 
1).  This  is  because  the  dominant  object  contributes  most 
to  the  histograms.  In  using  a  single  scale  poorer  results  are 
achieved  and  background  affects  the  results  more  signifi¬ 
cantly. 

The  results  of  these  examples  are  discussed  below,  with 
the  precision  over  all  recall  points  depicted  in  parenthe¬ 
ses.  For  comparison  the  best  text  retrieval  engines  have  an. 
average  precision  of  50%: 

1 .  Find  similar  cars(65%).  Pictures  of  cars  viewed  from 
similar  orientations  appear  in  the  top  ranks  because 
of  the  contribution  of  the  phase  histogram.  This  re¬ 
sult  also  shows  that  some  background  variation  can 
be  tolerated.  The  eighth  retrieval  although  a  car  is  a 
mismatch  and  is  not  considered. 

2.  Find  same  face(87.4%)  and  find  similar  faces:  In  the 
face  query  the  objective  is  to  find  the  same  face.  In 
experiments  with  a  University  of  Bern  face  database 
of  300  faces  with  a  10  relevant  faces  each,  the  average 
precision  over  all  recall  points  for  all  300  queries  was 
78%.  It  should  be  noted  that  the  system  presented 
here  works  well  for  faces  with  the  same  representa¬ 
tion  and  parameters  used  for  all  the  other  databases. 
There  is  no  specific  “tuning”  or  learning  involved  to 
retrieve  faces.  The  query  “find  similar  faces”  resulted 
in  a  100%  precision  at  48  ranks  because  there  are  far 
more  faces  than  48.  Therefore,  it  was  not  used  in  the 
final  precision  computation. 

3.  Find  dark  textured  apes  (64.2%).  The  ape  query  re¬ 
sults  in  several  other  light  textured  apes  and  country 
scenes  with  similar  texture.  Although  these  are  not 
mis-matches  they  are  not  consistent  with  the  intent  of 
the  query  which  is  to  find  dark  textured  apes. 

4.  Find  other  patas  monkeys.  (47.1%)  Here  there  are 
16  patas  monkeys  in  all  and  9  within  a  small  view 
variation.  However,  here  the  whole  image  is  being 
matched  so  the  number  of  relevant  patas  monkeys  is 
16-  The  precision  is  low  because  the  method  cannot 
distinguish  between  light  and  dark  textures,  leading 
to  irrelevant  images.  Note,  that  it  finds  other  apes, 
dark  textured  ones,  but  those  are  deemed  irrelevant 
with  respect  to  the  query. 

5.  Given  a  wall  with  a  Coca  Cola  logo  find  other  Coca 
Cola  images  (63.8%).  This  query  clearly  depicts  the 
limitation  of  global  matching.  Although  all  three 
database  images  that  had  a  certain  texture  of  the  wall 


Table  1:  Precision  at  standard  recall  points  for  six  Queries 
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is  the  case  where  the  global  appearance  is  very  different. 
The  Coca  Cola  retrieval  is  a  good  example  of  this.  Sec¬ 
ond,  mismatches  can  occur  at  the  algorithmic  level.  His¬ 
tograms  coarsely  represent  spatial  information  and  there¬ 
fore  will  admit  images  with  non-trivial  deformations.  The 
recall/precision  presented  here  compares  well  with  text  re¬ 
trieval.  The  time  per  retrieval  is  of  the  order  of  milli¬ 
seconds.  In  on  going  work  we  are  experimenting  with  a 
database  of  63000  images  and  the  amount  of  time  taken 
to  retrieve  is  still  less  than  a  second.  The  space  required 
is  also  a  small  fraction  of  the  database.  These  are  the  pri¬ 
mary  advantages  of  global  similarity  retrieval.  That  is,  to 
provide  a  low  storage,  high  speed  retrieval  with  good  re¬ 
call/precision. 

4  Conclusions  and  Limitations 

This  paper  demonstrates  retrieval  of  similar  objects  on 
the  basis  of  their  visual  appearance.  Visual  appearance 
is  characterized  using  filter  responses  to  Gaussian  deriva¬ 
tives  over  scale  space.  In  addition,  we  claim  that  global 
representations  are  better  constructed  by  representing  the 
distribution  of  robustly  computed  local  features.  Cur¬ 
rently  we  are  investigating  two  issues.  First  is  to  scale  the 
database  up  to  about  100000  images  and  second  is  to  pro¬ 
vide  a  mechanism  for  combining  global  and  local  similar¬ 
ity  matching  in  a  single  framework. 
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